Since the paper introduces interrupted time series (ITS) analysis as a practical method for event’s impact evaluation, we propose to study if we can apply ITS analysis to a different scenario: the China-United States trade war. To do so, we collect several different types of datasets (e.g., U.S. Trade in Goods with China, US foreign trade with product details) from the United States and China’s official website. We will then use ITS analysis on these datasets and see if there exists a significant impact on China-US trade. Moreover, we may try to extend the ITS analysis method to better interpret multiple events and other factors (e.g. tariffs). The visualization of analysis will allow us to understand the economic outcomes easily. Apart from the general implications for exports and imports, we are also interested in investigating further into other aspects of the trade war: increasing tariffs during the trade war, different levels of impacts in various industries, the resulting change in the trade of their business partners such as the European Union. All these results would provide us with a deeper understanding of the impacts of the trade war, and we would try to interpret them from different perspectives.

# All the packages used
import numpy as np
import pandas as pd
import datetime
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
import plotly.graph_objects as go
import plotly.express as px
import plotly
from plotly.subplots import make_subplots
from plotly.graph_objects import layout
import geopandas as gpd
from pycountry_convert import country_name_to_country_alpha3
from plotly.offline import iplot
import plotly.offline as pyo
import plotly.tools as tls
%matplotlib inline
pd.options.display.max_rows = 10
In the following analysis of different questions, there are some constants and functions frequently used in data loading or interrupted times series (ITS) analysis. We will define them here to make the work easier and more clear. Detailed explanation of each function has been covered in its docstring.
# Constant definition
BILATERAL_TRADE_PATH = "./data/trade.csv"
GLOBAL_TRADE_PATH = "./data/oecd_imts_data.csv"
def add_its_features(df, time_col_name, intervention_time):
"""
For extending the pandas dataframe with features required in interrupted times series (ITS) analysis.
ITS features include:
- `time_feature` : a continuous variable indicating time from the start of the study up to the end of the period of observation;
- `intervention` : coded 0 for pre-intervention time points and 1 for post-intervention time points
- `postslope` : coded 0 up to the last point before the intervention phase and coded sequentially from 1 thereafter
Parameters
----------
df : pandas Dataframe
dataframe prepared for ITS analysis
time_col_name : string
the column name of time series in the dataframe
intervention_time : string
the time of the interrupted event
Returns
-------
df_its : pandas Dataframe
dataframe df with extended ITS features
"""
df_its = df.copy(deep=True)
time = list(range(1, len(df_its) + 1))
df_its["time_feature"] = time
df_its["intervention"] = None
df_its["intervention"].mask(df_its[time_col_name] <= intervention_time, 0, inplace=True)
df_its["intervention"].mask(df_its[time_col_name] > intervention_time, 1, inplace=True)
pre = df_its[df_its[time_col_name] <= intervention_time]
post = df_its[df_its[time_col_name] > intervention_time]
postslope_pre = [0 for i in range(len(pre))]
postslope_post = list(range(1, len(post) + 1))
postslope = postslope_pre + postslope_post
df_its["postslope"] = postslope
return df_its
def plot_its_result(df, reg_res, time_col_name, target_col_name, intervention_time, title):
"""
For plotting the ITS regression analysis, specified with the two time periods: pre-intervention and post-intervention.
Parameters
----------
df : pandas Dataframe
dataframe prepared for ITS analysis
reg_res : statsmodels RegressionResultsWrapper
the regression result of its, including the coefficients
time_col_name : string
the column name of time series in the dataframe
target_col_name : string
the column name of the target variable
intervention_time : string
the time of the interrupted event
"""
# Set the plotting format
sns.set_style("ticks")
plt.figure(figsize=(12, 6))
# Retrieve the coefficients of the segmented regression model
beta_0, beta_2, beta_1, beta_3 = reg_res.params # intercept, intervention, time_feature, postslope
# Generate datapoints for the pre-period
pre = df[df[time_col_name] <= intervention_time]
pre_month_num = len(pre)
X_plot_pre = np.linspace(1, pre_month_num, 100)
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
# Generate datapoints for the post-period
X_plot_post = np.linspace(pre_month_num+1, len(df), 100)
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
# Visualization
g = sns.pointplot(x=df["time_feature"], y=df[target_col_name],
color='black', label=target_col_name+" (By Month)")
# Set the axis and format
g.set_title(title, loc="left", fontsize=14, weight="bold")
g.set_xlabel("Time (Months)")
g.set_xticks(list(range(0, len(df), 1)))
g.set_ylabel("Total Amount (millions of U.S. dollars)")
g.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p:format(int(x), ',')))
# Plot the two regression lines (pre/post)
plt.plot(X_plot_pre, Y_plot_pre, color="black", label="Trend Pre-Trade War")
plt.plot(X_plot_post, Y_plot_post, color="gray", label="Trend Post-Trade War")
# Mark the position of the intervention
plt.axvline(pre_month_num + 0.5, color="black", linestyle="--")
plt.text(pre_month_num + 2.5, max(df[target_col_name]), intervention_time, ha="center")
plt.legend()
plt.show()
return
The core question in our research is the analysis of trade war's direct impacts on the bilateral trade, including exports and imports amount. First we load the data from the United States Census website (data source):
bilateral_df = pd.read_csv(BILATERAL_TRADE_PATH)
bilateral_df
# Plot the trend of the trade
x = bilateral_df['time'].values.tolist()
imports = bilateral_df['imports'].values.tolist()
exports = bilateral_df['exports'].values.tolist()
# Fill the area between exports and imports
fig, ax = plt.subplots(1, 1, figsize=(12,8))
ax.fill_between(x, y1=imports, y2=0, label="US imports", alpha=0.5, color='tab:red', linewidth=2)
ax.fill_between(x, y1=exports, y2=0, label="US exports", alpha=0.5, color='tab:blue', linewidth=2)
# Figure format setting
ax.set_title('Bilateral Trade Trend of China-US, 2016-2020', fontsize=14)
ax.set(ylim=[0, 55000])
ax.legend(loc='best', fontsize=12)
plt.xticks(x[::5], fontsize=10, horizontalalignment='center')
plt.yticks(np.arange(5000, 55000, 5000), fontsize=10)
plt.xlim(x[0], x[-1])
# Draw Tick lines
for y in np.arange(5000, 55000, 5000):
plt.hlines(y, xmin=0, xmax=len(x), colors='black', alpha=0.3, linestyles="--", lw=0.5)
# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)
plt.show()
Given the plot above, we observe that:
(1) US imports from China are much higher than its exports. The US trade deficit in bilateral trade is approximately 30 billion dollars per month.
(2) U.S. imports from China fluctuate greatly each month and show a certain degree of cyclical characteristics.
(3) After 2020, monthly trade volume has changed greatly.
Firstly, according to the circumstance (3), we infer that COVID-19 pandemic has greatly impacted the bilateral trade. Also, in 2020 Jan 15, U.S. President Donald Trump and China's Vice Premier Liu He signed the US–China Phase One trade deal in Washington DC (source). This agreement marked a phased settlement of the trade war.
According to the two factors above, we think that the trade amount in 2020 do not help to investigate the trade war's impacts because it's hard to control the variable. The pandemic may be the cause of sharp decrease in US imports but the agreement could lead to the following increasing tread. Hence, we focus our research on year 2016-2019, which is also the main period of the trade war. Let's forget about the crazy and miserable 2020 in this research!
# Convert the datatype to datetime/numeric
bilateral_df.time = pd.to_datetime(bilateral_df.time)
bilateral_df.exports = pd.to_numeric(bilateral_df.exports)
bilateral_df.imports = pd.to_numeric(bilateral_df.imports)
bilateral_df = bilateral_df[bilateral_df.time < "2020-01"]
We need to add the ITS feature required for segmented regression analysis. Given that the U.S. took actions to apply tariffs on Chinese goods on March 2018 for the first time, we chose it as the trade war event intervention in our analysis.
# Add ITS features (time_feature, intervention, postslope)
bilateral_its_df = add_its_features(bilateral_df, "time", "2018-03")
bilateral_its_df
With the preprocessed data, we can now use segmented regression analysis to investigate the impacts of trade war on the bilateral trade. In detail, we will firstly look into the exports and imports of United States from China:
# Declare the model for exports segmented regression analysis
model_exports = smf.ols(formula='exports ~ time_feature + C(intervention) + postslope', data=bilateral_its_df)
# Fits the model (find the optimal coefficients, adding a random seed ensures consistency)
np.random.seed(42)
res_exports = model_exports.fit()
# Print the summary output
print(res_exports.summary())
# Declare the model for exports segmented regression analysis
model_imports = smf.ols(formula='imports ~ time_feature + C(intervention) + postslope', data=bilateral_its_df)
# Fits the model
res_imports = model_imports.fit()
# Print the summary output
print(res_imports.summary())
From the regression results, we can see that the trade war has statistically significant impact on U.S. exports to China, both immediately and in the long term: before the trade war, the exports stably increase with the coefficient of time_feature is around 129.18. However, the intervetion's coefficient is -2026.33 and postslope's is -191.62, indicating that the intervention not only immediately reduced the export value, but also showed a downward trend in the next two years.
On the other hand, the result on U.S. imports from China has a certain degree of difference with the exports. We can conclude that the imports from China has both higher increasing trend before the trade war (with coefficient 340.74) and more severe downward trend (with coefficient -844.34) after the trade war. However, it shows that the trade war's outbreak led to an immediate increase (with coefficient 2042.09), whereas not statistically significant (with p-value 0.433).
Does the result reliable and trustworthy? We now need to do some further investigation to strengthen the analysis!
Remember that we observed obvious seasonal pattern in the imports trend. To analyze the intervention's impacts, we need to remove the seasonality in the time series. seasonal_decompose will break down the time series into trend, seasonal and residual components. We plot both the imports and exports trend, seasonality and residual for comparison:
# Decompose
dates = pd.DatetimeIndex([d for d in bilateral_its_df['time']])
bilateral_its_df.set_index(dates, inplace=True)
result_imports = seasonal_decompose(bilateral_its_df['imports'], model='additive', period=12, extrapolate_trend='freq')
# Plot
plt.rcParams.update({'figure.figsize': (10,10)})
result_imports.plot().suptitle('Time Series Decomposition of US Imports', x=0.5, y=0, fontsize=14)
plt.show()
# Decompose
dates = pd.DatetimeIndex([d for d in bilateral_its_df['time']])
bilateral_its_df.set_index(dates, inplace=True)
result_exports = seasonal_decompose(bilateral_its_df['exports'], model='additive', period=12, extrapolate_trend='freq')
# Plot
plt.rcParams.update({'figure.figsize': (10,10)})
result_exports.plot().suptitle('Time Series Decomposition of US Exports', x=0.5, y=0, fontsize=14)
plt.show()
The seasonal pattern in US imports is regular and clear, as we had observed in previous plots. US exports has less significant seasonality than imports. Removing both of them helps the rationality of the regression analysis. Note that the trend in the above plots have revealed the impacts of trade war, but we have to use regression to quantify the impact.
# Add the trend to dataframe
bilateral_its_df["imports_trend"] = result_imports.trend + result_imports.resid
bilateral_its_df["exports_trend"] = result_exports.trend + result_exports.resid
# Declare the model for exports segmented regression analysis
model_imports = smf.ols(formula='imports_trend ~ time_feature + C(intervention) + postslope', data=bilateral_its_df)
# Fits the model
res_imports_bilateral = model_imports.fit()
# Print the summary output
print(res_imports_bilateral.summary())
# Declare the model for exports segmented regression analysis
model_exports = smf.ols(formula='exports_trend ~ time_feature + C(intervention) + postslope', data=bilateral_its_df)
# Fits the model
res_exports_bilateral = model_exports.fit()
# Print the summary output
print(res_exports_bilateral.summary())
plot_its_result(bilateral_its_df, res_exports_bilateral, "time", "exports_trend", "2018-03", "Pre and Post Trade War Bilateral Exports (U.S. to China) Trend")
plot_its_result(bilateral_its_df, res_imports_bilateral, "time", "imports_trend", "2018-03", "Pre and Post Trade War Bilateral Imports (China to US) Trend")
After removing the seasonality in the trade data, we can see that all the coefficients are now statistically significant now (p-value < 0.05). Moreover, the R-squared score (i.e., coefficient of determination) of the regression results have increased a lot, indicating that the model fits the data better than not being processed.
Now that we have removed the seasonality in data, can we make the final conclusion now? Not really. We have not selected a comparator group to strengthen our results. If the intervention's impact is obvious on the treatment group but does not exist in the control group, then we can draw more convincing conclusion.
Since China-US is the most important bilateral trade relationship in the world, any other bilateral trade in the world could be deeply impacted by it, e.g. EU, Japan. We think that using the global trade amount as the comparator group is a resonable choice. It will help us to exclude other global event's impact such as world-wide financial crisis and COVID-19 pandemic.
In this section, we will use the data from Organisation for Economic Co-operation and Development (OECD). data source
We export the "Monthly International Merchandise Trade" (IMTS) series from 2016-01 to 2020-10. Then, we will load the data and extract the information we need:
# Load original csv data
oecd_df = pd.read_csv(GLOBAL_TRADE_PATH)
oecd_df
# Select OECD-Total and Non-OECD countries
country_queries = ["OECD - Total", "Argentina", "Brazil", "China (People's Republic of)", "Costa Rica", "India",
"Indonesia", "Russia", "South Arabia", "South Africa"]
subject_choices = ["Imports in goods (value)", "Exports in goods (value)"]
measure = ["US-Dollar converted, Seasonally adjusted"]
target_df = oecd_df[oecd_df["Country"].isin(country_queries) & oecd_df["Subject"].isin(subject_choices) & oecd_df["Measure"].isin(measure)]
target_df
# Calculate the monthly amount of global trend
global_monthly_df = target_df.groupby(["Subject", "Time"]).aggregate({'Value':'sum'})
global_monthly_df = global_monthly_df.unstack(level=0)["Value"]
global_monthly_df.index = pd.to_datetime(global_monthly_df.index.map(lambda x : datetime.datetime.strptime(x, '%b-%y')))
global_monthly_df = global_monthly_df.sort_values(by='Time')
global_monthly_df = global_monthly_df[global_monthly_df.index < "2020-01"]
global_monthly_df
# Plot the time series (global trade trend)
fig, ax = plt.subplots(1,1,figsize=(12, 8))
y_LL = 900
y_UL = 1400
y_interval = 50
plt.plot(global_monthly_df.index.values, global_monthly_df["Exports in goods (value)"].values, lw=1.5, color='tab:red', label="Global Exports")
plt.plot(global_monthly_df.index.values, global_monthly_df["Imports in goods (value)"].values, lw=1.5, color='tab:blue', label="Global Imports")
# Decorations
plt.tick_params(axis="both", which="both", bottom=False, top=False,
labelbottom=True, left=False, right=False, labelleft=True)
# Lighten borders
plt.gca().spines["top"].set_alpha(.3)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.3)
plt.gca().spines["left"].set_alpha(.3)
plt.title('International Merchandise Trade Trend', fontsize=16)
plt.yticks(range(y_LL, y_UL, y_interval), [str(y) for y in range(y_LL, y_UL, y_interval)], fontsize=12)
plt.ylim(y_LL, y_UL)
plt.legend()
plt.show()
Interestingly, we observe that the increasing trend of International Merchandise Trade amount also slowed down and gradually decreased after 2018. We will apply segmented regression analysis to see the trend's relationship with the trade war:
global_monthly_df["time"] = global_monthly_df.index.values
# Add ITS features (time_feature, intervention, postslope)
global_its_df = add_its_features(global_monthly_df, "time", "2018-03")
global_its_df = global_its_df.rename(columns={'Exports in goods (value)': 'exports', 'Imports in goods (value)': 'imports'}, errors="raise")
global_its_df
# Declare the model for exports segmented regression analysis
model_exports = smf.ols(formula='exports ~ time_feature + C(intervention) + postslope', data=global_its_df)
# Fits the model
res_exports_global = model_exports.fit()
# Print the summary output
print(res_exports_global.summary())
# Declare the model for exports segmented regression analysis
model_imports = smf.ols(formula='imports ~ time_feature + C(intervention) + postslope', data=global_its_df)
# Fits the model
res_imports_global = model_imports.fit()
# Print the summary output
print(res_imports_global.summary())
Clearly, the global trade did not immediately experience the negative effects of the trade war, for the intervention's coefficients are positive (21.12 and 34.23 respectively). However, the long-term negative effect exists in the regression results (with negative postslope's coefficients).
Does this overturn our previous conclusion in China-US Trade Trend?
Not exactly, because the long-term effect is less noticeable than the impact discovered China-US trade relationships:
# Compare the relative long-term impact
print("[Imports China-US] The ratio of postslope post trade war to slope pre trade war is %.2f."% abs(-905.9217/304.9529))
print("[Imports Global] The ratio of postslope post trade war to slope pre trade war is %.2f."% abs(-15.3932/11.4242))
print("")
print("[Exports US-China] The ratio of postslope post trade war to slope pre trade war is %.2f."% abs(-213.6439/105.8624))
print("[Exports Global] The ratio of postslope post trade war to slope pre trade war is %.2f."% abs(-12.6810/10.0023))
The plot below can better reflect the difference in both the immediate/long-term impact on China-US bilateral trade and the international market:
# [Imports] Plot the compared two time series (China-US vs. Global) with different scales
x = bilateral_its_df["time_feature"]
y1 = bilateral_its_df["imports_trend"]
y2 = global_its_df["imports"]
# Plot China-US bilateral line
fig, ax1 = plt.subplots(1,1,figsize=(12,8))
ax1.scatter(x, y1, color='tab:red')
ax1.plot(x, y1, color='tab:red')
# Plot Global Market line
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.scatter(x, y2, color='tab:blue')
ax2.plot(x, y2, color='tab:blue')
# Plot the regression lines
pre_month_num = 26
X_plot_pre = np.linspace(1, pre_month_num, 100)
X_plot_post = np.linspace(pre_month_num+1, len(x), 100)
beta_0, beta_2, beta_1, beta_3 = res_imports_bilateral.params
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
ax1.plot(X_plot_pre, Y_plot_pre, color="black", label="Bilateral Trend Pre-Trade War")
ax1.plot(X_plot_post, Y_plot_post, color="black", label="Bilateral Trend Post-Trade War")
beta_0, beta_2, beta_1, beta_3 = res_imports_global.params
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
ax2.plot(X_plot_pre, Y_plot_pre, color="gray", label="Global Trend Pre-Trade War")
ax2.plot(X_plot_post, Y_plot_post, color="gray", label="Global Trend Post-Trade War")
# Plot the intervention line (2018-03) TODO: more elegant intervention line?
plt.axvline(pre_month_num+0.5, color="gray", linestyle="-")
plt.text(pre_month_num+0.5, 960, "2018-03: Trade War Outbreak", ha="left", fontsize=14)
# Decorations
# ax1 (left Y axis)
ax1.set_xlabel('Time', fontsize=14)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('U.S. Monthly Imports Amount from China (millions USD)', color='tab:red', fontsize=14)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.4)
# ax2 (right Y axis)
xticklabels = [i.strftime("%b-%Y") for i in bilateral_its_df["time"]]
ax2.set_ylabel("Global Monthly Imports Amount (billions USD)", color='tab:blue', fontsize=14)
ax2.tick_params(axis='y', labelcolor='tab:blue')
ax2.set_xticks(np.arange(0, len(x), 4))
ax2.set_xticklabels(xticklabels[::4], rotation=90, fontdict={'fontsize':10})
ax2.set_title("China-US Bilateral Imports vs. Global Imports", fontsize=18)
fig.tight_layout()
plt.show()
# [Exports] Plot the compared two time series (China-US vs. Global) with different scales
x = bilateral_its_df["time_feature"]
y1 = bilateral_its_df["exports_trend"]
y2 = global_its_df["exports"]
# Plot China-US bilateral line
fig, ax1 = plt.subplots(1,1,figsize=(12,8))
ax1.scatter(x, y1, color='tab:red')
ax1.plot(x, y1, color='tab:red')
# Plot Global Market line
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.scatter(x, y2, color='tab:blue')
ax2.plot(x, y2, color='tab:blue')
# Plot the regression lines
pre_month_num = 26
X_plot_pre = np.linspace(1, pre_month_num, 100)
X_plot_post = np.linspace(pre_month_num+1, len(x), 100)
beta_0, beta_2, beta_1, beta_3 = res_exports_bilateral.params
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
ax1.plot(X_plot_pre, Y_plot_pre, color="black", label="Bilateral Trend Pre-Trade War")
ax1.plot(X_plot_post, Y_plot_post, color="black", label="Bilateral Trend Post-Trade War")
beta_0, beta_2, beta_1, beta_3 = res_exports_global.params
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
ax2.plot(X_plot_pre, Y_plot_pre, color="gray", label="Global Trend Pre-Trade War")
ax2.plot(X_plot_post, Y_plot_post, color="gray", label="Global Trend Post-Trade War")
# Plot the intervention line (2018-03) TODO: more elegant intervention line?
plt.axvline(pre_month_num+0.5, color="gray", linestyle="-")
plt.text(pre_month_num+0.5, 980, "2018-03: Trade War Outbreak", ha="left", fontsize=14)
# Decorations
# ax1 (left Y axis)
ax1.set_xlabel('Time', fontsize=14)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('U.S. Monthly Exports Amount from China (millions USD)', color='tab:red', fontsize=14)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.4)
# ax2 (right Y axis)
xticklabels = [i.strftime("%b-%Y") for i in bilateral_its_df["time"]]
ax2.set_ylabel("Global Monthly Exports Amount (billions USD)", color='tab:blue', fontsize=14)
ax2.tick_params(axis='y', labelcolor='tab:blue')
ax2.set_xticks(np.arange(0, len(x), 4))
ax2.set_xticklabels(xticklabels[::4], rotation=90, fontdict={'fontsize':10})
ax2.set_title("China-US Bilateral Exports vs. Global Exports", fontsize=18)
fig.tight_layout()
plt.show()
Though the global economy is in a slump started from the year 2018, the impact of the U.S.- China trade war on their bilateral trade is much greater than the impact of the global economic downturn.
However, is it the trade war between the U.S. and China that cause the global economic downturns or the rise in protectionism that cause this US-China trade war, and the global trade amount decreases? The cause-effect relationship is difficult to analyze by applying segmented regression analysis.
Despite the limitation of segmented regression analysis, we can still observe some interesting phenomenon from it.
There is a time delay in the immediate impact on Chinese exports to the US (the intervention increases rather than decreases, but then plummets after a few months) and the long-term impact is more significant since the |postslope's coefficient| is roughly three times more than |preslope's coefficient| in Chinese exports to the US and the |postslope's coefficient| is roughly two times more than |preslope's coefficient| in US exports to Chinese.
Although the U.S. data shows a larger bilateral deficit with China, if we consider the imports and exports respectively, China loses more in the trade war because the difference of China's exports between pre-war and post-war reduce more than US's exports, which indicates that China has greater lost.
Let's focus on the next question. Would China-US Trade War influence the trade with their primary business partners? First, to understand US primary business partners, we give a look at US top business trade partner by using this data (data source) from the United States Census website.
# Read US import/export data
US_IE_PATH = "data/us_ie_partner.xlsx"
us_ie = pd.read_excel(US_IE_PATH)
us_ie.head()
# Split two dataframe: import and export
us_import = us_ie.iloc[:, :16]
us_export = pd.concat([us_ie.iloc[:, :3], us_ie.iloc[:, 16:29]], axis=1)
us_import.head()
us_export.head()
# To find US top trade partners, we aggregate the total trade amount each year from 1985 to 2020
# Find top five import partners (without China)
us_import_year = us_import.filter(['CTYNAME','IYR'], axis=1)
us_import_year = us_import_year.groupby('CTYNAME').sum()
us_import_year = us_import_year.drop('China')
us_import_year = us_import_year.sort_values(by=['IYR'], ascending=False)
# Find top five export partners (without China)
us_export_year = us_export.filter(['CTYNAME','EYR'], axis=1)
us_export_year = us_export_year.groupby('CTYNAME').sum()
us_export_year = us_export_year.drop('China')
us_export_year = us_export_year.sort_values(by=['EYR'], ascending=False)
# Find top trade partners (without China)
us_ie_year = pd.concat([us_import_year, us_export_year], axis=1)
us_ie_year['SUM'] = us_ie_year.IYR + us_ie_year.EYR
us_ie_year = us_ie_year.sort_values(by=['SUM'], ascending=False)
# Visualize to get a clear picture on US top ten trade partners
us_import_year_top = us_import_year[:10]
us_export_year_top = us_export_year[:10]
us_ie_year_top = us_ie_year[:10]
sns.set(rc={'figure.figsize':(15,6)})
fig = plt.figure()
ax1 = plt.subplot(131)
sns.barplot(us_import_year_top.IYR,us_import_year_top.index).set_title('Top 10 Import Partners', fontsize=14)
ax2 = plt.subplot(132)
sns.barplot(us_export_year_top.EYR,us_export_year_top.index).set_title('Top 10 Export Partners', fontsize=14)
ax3 = plt.subplot(133)
sns.barplot(us_ie_year_top.SUM,us_ie_year_top.index).set_title('Top 10 Total Trade Amount(Import+Export) Partners', fontsize=14)
ax1.xaxis.label.set_visible(False)
ax2.xaxis.label.set_visible(False)
ax3.xaxis.label.set_visible(False)
ax1.yaxis.label.set_visible(False)
ax2.yaxis.label.set_visible(False)
ax3.yaxis.label.set_visible(False)
fig.text(0.5, 0, 'Total Amount(Millions of US dollars)', ha='center', fontsize=14)
fig.text(0, 0.5, 'US Business partners', va='center', rotation='vertical', fontsize=14)
fig.tight_layout()
plt.show()
#fig.savefig('us_trade_partner.png')
From the above plots, we can easily know the US primary trade partners. Here we remove the China-US trade as we want to focus on the impact on other business partners. Canada, Mexico and Japan are the top3 trade partners with the US in import partners, export partners and total trade amount partners. Besides, most of the top10 business partners in these three leaderboard are the same but in different rankings.
We then decide to look into top10 import partners and top10 export partners respectively by applying segmented regression analysis.
# Create dataframe for each top 10 import partners
# We focus data since 2016
us_import = us_import.loc[us_import['year'] >= 2016]
us_export = us_export.loc[us_export['year'] >= 2016]
def create_us_ie_top10(CTYNAME, df, ie):
top10 = pd.DataFrame(columns = ['time', ie])
tmp = df.loc[df['CTYNAME'] == CTYNAME]
for i, tuples in enumerate(tmp.itertuples(), 0):
for j in range(1,13):
top10.loc[(12*i) + j-1] = [str(2016+i) +'-'+str(j), tuples[j+3]]
return top10
us_import_top10 = us_import_year_top.index
us_import_df = []
us_export_top10 = us_export_year_top.index
us_export_df = []
for i in range(10):
us_import_df.append(create_us_ie_top10(us_import_top10[i], us_import, 'imports'))
us_export_df.append(create_us_ie_top10(us_export_top10[i], us_export, 'exports'))
us_import_df[0].head()
us_export_df[0].head()
# Analysis on import partners
us_imports_model = []
us_imports_its = []
for partners in us_import_df:
# Convert the datatype to datetime/numeric
partners.time = pd.to_datetime(partners.time)
partners.imports = pd.to_numeric(partners.imports)
# Remove time >=2020-01
partners = partners.loc[partners['time'] < '2020-01']
# Add ITS features (time_feature, intervention, postslope)
its_import = add_its_features(partners, "time", "2018-03")
# Declare the model for exports segmented regression analysis
model_imports = smf.ols(formula='imports ~ time_feature + C(intervention) + postslope', data=its_import)
# Fits the model (find the optimal coefficients, adding a random seed ensures consistency)
np.random.seed(42)
res_imports = model_imports.fit()
us_imports_model.append(res_imports)
us_imports_its.append(its_import)
# Print the summary output
print(us_imports_model[2].summary())
# Analysis on export partners
us_exports_model = []
us_exports_its = []
for partners in us_export_df:
# Convert the datatype to datetime/numeric
partners.time = pd.to_datetime(partners.time)
partners.exports = pd.to_numeric(partners.exports)
# Remove time >=2020-01
partners = partners.loc[partners['time'] < '2020-01']
# Add ITS features (time_feature, intervention, postslope)
its_export = add_its_features(partners, "time", "2018-03")
# Declare the model for exports segmented regression analysis
model_exports = smf.ols(formula='exports ~ time_feature + C(intervention) + postslope', data=its_export)
# Fits the model (find the optimal coefficients, adding a random seed ensures consistency)
np.random.seed(42)
res_exports = model_exports.fit()
us_exports_model.append(res_exports)
us_exports_its.append(its_export)
# Print the summary output
print(us_exports_model[0].summary())
# Plot ITS to see the result on import partners
fig, axs = plt.subplots(2, 5, figsize=(40,10))
fig2 = go.Figure()
for i, ax in enumerate(fig.axes):
# Retrieve the coefficients of the segmented regression model
beta_0, beta_2, beta_1, beta_3 = us_imports_model[i].params # intercept, intervention, time_feature, postslope
# Generate datapoints for the pre-period
pre = us_imports_its[i][us_imports_its[i]["time"] <= "2018-03"]
pre_month_num = len(pre)
X_plot_pre = np.linspace(1, pre_month_num, 100)
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
# Generate datapoints for the post-period
X_plot_post = np.linspace(pre_month_num+1, len(us_imports_its[i]), 100)
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
# Visualization
ax.scatter(x=us_imports_its[i]["time_feature"], y=us_imports_its[i]["imports"])
# Set the axis and format
ax.set_title("Bilateral Imports (from "+ us_import_top10[i]+ " to US) Trend", loc="center", fontsize=14, weight="bold")
ax.set_xlabel("Time (Months)")
ax.set_xticks(list(range(0, len(us_imports_its[i]), 6)))
ax.set_ylabel("Total Amount (millions of U.S. dollars)")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p:format(int(x), ',')))
# Plot the two regression lines (pre/post)
ax.plot(X_plot_pre, Y_plot_pre, color="black", label="Trend Pre-Trade War")
ax.plot(X_plot_post, Y_plot_post, color="gray", label="Trend Post-Trade War")
# Mark the position of the intervention
ax.axvline(pre_month_num + 0.5, color="black", linestyle="--")
ax.text(pre_month_num + 2.5, min(us_imports_its[i]["imports"]), "2018-03", ha="center")
plt.tight_layout()
plt.show()
#plotly_fig = tls.mpl_to_plotly(fig)
#plotly.offline.plot(plotly_fig, filename="us trade partner")
# Plot ITS to see the result on export partners
fig, axs = plt.subplots(2, 5, figsize=(40,10))
for i, ax in enumerate(fig.axes):
# Retrieve the coefficients of the segmented regression model
beta_0, beta_2, beta_1, beta_3 = us_exports_model[i].params # intercept, intervention, time_feature, postslope
# Generate datapoints for the pre-period
pre = us_exports_its[i][us_exports_its[i]["time"] <= "2018-03"]
pre_month_num = len(pre)
X_plot_pre = np.linspace(1, pre_month_num, 100)
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
# Generate datapoints for the post-period
X_plot_post = np.linspace(pre_month_num+1, len(us_exports_its[i]), 100)
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
# Visualization
ax.scatter(x=us_exports_its[i]["time_feature"], y=us_exports_its[i]["exports"])
# Set the axis and format
ax.set_title("Bilateral Exports (from US to "+ us_export_top10[i]+ ") Trend", loc="center", fontsize=14, weight="bold")
ax.set_xlabel("Time (Months)")
ax.set_xticks(list(range(0, len(us_exports_its[i]), 6)))
ax.set_ylabel("Total Amount (millions of U.S. dollars)")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p:format(int(x), ',')))
# Plot the two regression lines (pre/post)
ax.plot(X_plot_pre, Y_plot_pre, color="black", label="Trend Pre-Trade War")
ax.plot(X_plot_post, Y_plot_post, color="gray", label="Trend Post-Trade War")
# Mark the position of the intervention
ax.axvline(pre_month_num + 0.5, color="black", linestyle="--")
ax.text(pre_month_num + 2.5, min(us_exports_its[i]["exports"]), "2018-03", ha="center")
plt.tight_layout()
plt.show()
Now we get our segmented regression analysis results from both import and export business partners. We can see that for most of the import/export partners, the trade amount increased when the time trade war started.
So, how can we know wich business partner has been affected by trade war the most? And which business partner has little impact? It seems hard to tell the answer straight away by looking through too many plots at the same time.
To get a deeper look on impacts between different business partners, we decide to focus on the difference between pre-trend and pos-trend trade amount on each US trade partners and see who gets more impacts.
# More analysis on top10 import/export partners
# We use data from ITS reports and propose a new formula:
# if time feature.coeff>0: postslope.coeff/time feature.coeff
# else: postslope.coeff/(-time feature.coeff)
import_impact = []
export_impact = []
for i in range(10):
if us_imports_model[i].params[2]>0:
import_impact.append(us_imports_model[i].params[3]/us_imports_model[i].params[2])
else:
import_impact.append(-us_imports_model[i].params[3]/us_imports_model[i].params[2])
if us_exports_model[i].params[2]>0:
export_impact.append(us_exports_model[i].params[3]/us_exports_model[i].params[2])
else:
export_impact.append(-us_exports_model[i].params[3]/us_exports_model[i].params[2])
# Visualize
data_import = {'CTYNAME': us_import_top10, 'imports': import_impact}
data_export = {'CTYNAME': us_export_top10, 'exports': export_impact}
us_import_impact = pd.DataFrame(data=data_import)
us_export_impact = pd.DataFrame(data=data_export)
us_import_impact = us_import_impact.sort_values(by=['imports'], ascending=False)
us_export_impact = us_export_impact.sort_values(by=['exports'], ascending=False)
sns.set(rc={'figure.figsize':(15,6)})
fig = plt.figure()
ax1 = plt.subplot(121)
sns.barplot(x=us_import_impact.CTYNAME, y=us_import_impact.imports, data=us_import_impact, capsize=.05, palette="Blues_r").set_title('Impact on Top 10 Import Partners', fontsize=14)
ax2 = plt.subplot(122)
sns.barplot(x=us_export_impact.CTYNAME, y=us_export_impact.exports, data=us_export_impact, capsize=.05, palette="Blues_d").set_title('Impact on Top 10 Export Partners', fontsize=14)
ax1.set_xticklabels(us_import_impact.CTYNAME, rotation=45)
ax2.set_xticklabels(us_export_impact.CTYNAME, rotation=45)
ax1.xaxis.label.set_visible(False)
ax2.xaxis.label.set_visible(False)
ax1.yaxis.label.set_visible(False)
ax2.yaxis.label.set_visible(False)
fig.text(0.5, 0, 'Business Partners', ha='center', fontsize=14)
fig.text(-0.01, 0.5, 'Trade War Impact', va='center', rotation='vertical', fontsize=14)
#ax1.axvline(pre_month_num + 0.5, color="black", linestyle="--")
#ax2.axvline(pre_month_num + 0.5, color="black", linestyle="--")
fig.tight_layout()
plt.show()
#fig.savefig('us_impact.png')
From the above plot, we can easily view the trade war impact on different business partners. The trade war has more positive impact(impact>1.5) on US import partners: South Korea and Taiwan, and with slightly positive/negative impact on other business partners. On the other side, most of the US export partners has small negative impacts. As being US export partners, Mexico and Canada gain more negative impacts(impact>1.5) on their trade amount.
Let us trun to focus on China trade partners by using this data (data source) from China General Administration of Customs website.
# Read China import/export data
CHINA_IE_PATH = 'data/china_ie_partner/'
yr = ['2016','2017','2018','2019']
month = ['01','02','03','04','05','06','07','08','09','10','11','12']
frame = []
for i in yr:
for j in month:
frame.append(pd.read_excel(CHINA_IE_PATH + i + "-" + j + ".xlsx"))
# conbime each month in one dataframe
china_ie = pd.concat(frame)
# Data cleaning and convert import/export to numeric
china_ie['进口'] = china_ie['进口'].replace('-', '0')
china_ie['进口'] = china_ie['进口'].replace(',', '', regex=True)
china_ie['进口'] = np.float64(china_ie['进口'])
china_ie['出口'] = china_ie['出口'].replace('-', '0')
china_ie['出口'] = china_ie['出口'].replace(',', '', regex=True)
china_ie['出口'] = np.float64(china_ie['出口'])
china_ie['进出口'] = china_ie['进出口'].replace('-', '0')
china_ie['进出口'] = china_ie['进出口'].replace(',', '', regex=True)
china_ie['进出口'] = np.float64(china_ie['进出口'])
# change unit from thousand of US dollar to million of US dollars
china_ie['进口'] = china_ie['进口']/1000
china_ie['出口'] = china_ie['出口']/1000
china_ie['进出口'] = china_ie['进出口']/1000
china_ie.head()
# Split two dataframe: import and export
china_import = pd.concat([china_ie.iloc[:, :1], china_ie.iloc[:, 3:6]], axis=1)
china_export = pd.concat([china_ie.iloc[:, :1], china_ie.iloc[:, 2:3],china_ie.iloc[:, 4:6]], axis=1)
# Change column name
china_import.columns = ['CTYNAME', 'imports', 'year', 'month']
china_export.columns = ['CTYNAME', 'exports', 'year', 'month']
china_import.head()
china_export.head()
# To find US top trade partners, we aggregate the total trade amount each year
# Find top five import partners (without US)
china_import_year = china_import.filter(['CTYNAME', 'imports'], axis=1)
china_import_year = china_import_year.groupby('CTYNAME').sum()
# drop data that is a region(ex. africa, south america)/US
china_import_year = china_import_year.drop('North America')
china_import_year = china_import_year.drop('China')
china_import_year = china_import_year.drop('Latin America')
china_import_year = china_import_year.drop('Africa')
china_import_year = china_import_year.drop('United States')
china_import_year = china_import_year.drop('Oceania')
china_import_year = china_import_year.sort_values(by=['imports'], ascending=False)
# Find top five export partners (without US)
china_export_year = china_export.filter(['CTYNAME', 'exports'], axis=1)
china_export_year = china_export_year.groupby('CTYNAME').sum()
# drop data that is a region(ex. africa, south america)/US
china_export_year = china_export_year.drop('North America')
china_export_year = china_export_year.drop('China')
china_export_year = china_export_year.drop('Latin America')
china_export_year = china_export_year.drop('United States')
china_export_year = china_export_year.drop('Oceania')
china_export_year = china_export_year.drop('Africa')
china_export_year = china_export_year.sort_values(by=['exports'], ascending=False)
# Find top trade partners (without US)
china_ie_year = pd.concat([china_import_year, china_export_year], axis=1)
china_ie_year['SUM'] = china_ie_year.imports + china_ie_year.exports
china_ie_year = china_ie_year.sort_values(by=['SUM'], ascending=False)
# Visualize to get a clear picture on US top ten trade partners
china_import_year_top = china_import_year[:10]
china_export_year_top = china_export_year[:10]
china_ie_year_top = china_ie_year[:10]
sns.set(rc={'figure.figsize':(15,6)})
fig = plt.figure()
ax1 = plt.subplot(131)
sns.barplot(china_import_year_top.imports,china_import_year_top.index).set_title('Top 10 Import Partners', fontsize=14)
ax2 = plt.subplot(132)
sns.barplot(china_export_year_top.exports,china_export_year_top.index).set_title('Top 10 Export Partners', fontsize=14)
ax3 = plt.subplot(133)
sns.barplot(china_ie_year_top.SUM,china_ie_year_top.index).set_title('Top 10 Total Trade Amount(Import+Export) Partners', fontsize=14)
ax1.xaxis.label.set_visible(False)
ax1.ticklabel_format(axis="x", style="sci", scilimits=(0,0))
ax2.xaxis.label.set_visible(False)
ax3.xaxis.label.set_visible(False)
ax1.yaxis.label.set_visible(False)
ax2.yaxis.label.set_visible(False)
ax3.yaxis.label.set_visible(False)
fig.text(0.5, 0, 'Total Amount(Millions of US dollars)', ha='center', fontsize=14)
fig.text(0, 0.5, 'China Business partners', va='center', rotation='vertical', fontsize=14)
fig.tight_layout()
plt.show()
#fig.savefig('china_trade_partner.png')
From the above plots, we can learn China primary trade partners. Similarly, we remove the China-US trade like before as we want to focus on the impact on other business partners.
Japan, Hong Kong and South Korea are the top3 total trade amoount partners with China. Besides, most of the top10 business partners in this three leaderboard are the same but in different rankings.
We then look into segmented regression analysis on different China trade partners.
# Create dataframe for each top 5 import partners
def create_china_ie_top10(CTYNAME, df, ie):
top10 = pd.DataFrame(columns = ['time', ie])
tmp = df.loc[df['CTYNAME'] == CTYNAME]
for i, tuples in enumerate(tmp.itertuples(), 0):
top10.loc[i] = [str(2016+ int(i/12)) +'-'+str(i%12 +1), tuples[2]]
return top10
china_import_top10 = china_import_year_top.index
china_import_df = []
china_export_top10 = china_export_year_top.index
china_export_df = []
for i in range(10):
china_import_df.append(create_china_ie_top10(china_import_top10[i], china_import, 'imports'))
china_export_df.append(create_china_ie_top10(china_export_top10[i], china_export, 'exports'))
china_import_df[0].head()
china_export_df[0].head()
# Analysis on import partners
china_imports_model = []
china_imports_its = []
for partners in china_import_df:
# Convert the datatype to datetime/numeric
partners.time = pd.to_datetime(partners.time)
partners.imports = pd.to_numeric(partners.imports)
# Remove time >=2020-01
partners = partners.loc[partners['time'] < '2020-01']
# Add ITS features (time_feature, intervention, postslope)
its_import = add_its_features(partners, "time", "2018-03")
# Declare the model for exports segmented regression analysis
model_imports = smf.ols(formula='imports ~ time_feature + C(intervention) + postslope', data=its_import)
# Fits the model (find the optimal coefficients, adding a random seed ensures consistency)
np.random.seed(42)
res_imports = model_imports.fit()
china_imports_model.append(res_imports)
china_imports_its.append(its_import)
# Print the summary output
print(china_imports_model[0].summary())
# Analysis on export partners
china_exports_model = []
china_exports_its = []
for partners in china_export_df:
# Convert the datatype to datetime/numeric
partners.time = pd.to_datetime(partners.time)
partners.exports = pd.to_numeric(partners.exports)
# Remove time >=2020-01
partners = partners.loc[partners['time'] < '2020-01']
# Add ITS features (time_feature, intervention, postslope)
its_export = add_its_features(partners, "time", "2018-03")
# Declare the model for exports segmented regression analysis
model_exports = smf.ols(formula='exports ~ time_feature + C(intervention) + postslope', data=its_export)
# Fits the model (find the optimal coefficients, adding a random seed ensures consistency)
np.random.seed(42)
res_exports = model_exports.fit()
china_exports_model.append(res_exports)
china_exports_its.append(its_export)
# Print the summary output
print(china_exports_model[7].summary())
# Plot ITS to see the result on export partners
fig, axs = plt.subplots(2, 5, figsize=(40,10))
for i, ax in enumerate(fig.axes):
# Retrieve the coefficients of the segmented regression model
beta_0, beta_2, beta_1, beta_3 = china_exports_model[i].params # intercept, intervention, time_feature, postslope
# Generate datapoints for the pre-period
pre = china_exports_its[i][china_exports_its[i]["time"] <= "2018-03"]
pre_month_num = len(pre)
X_plot_pre = np.linspace(1, pre_month_num, 100)
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
# Generate datapoints for the post-period
X_plot_post = np.linspace(pre_month_num+1, len(china_exports_its[i]), 100)
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
# Visualization
ax.scatter(x=china_exports_its[i]["time_feature"], y=china_exports_its[i]["exports"], color='r')
# Set the axis and format
ax.set_title("Bilateral Exports (from China to "+ china_export_top10[i]+ ") Trend", loc="center", fontsize=14, weight="bold")
ax.set_xlabel("Time (Months)")
ax.set_xticks(list(range(0, len(china_exports_its[i]), 6)))
ax.set_ylabel("Total Amount (millions of U.S. dollars)")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p:format(int(x), ',')))
# Plot the two regression lines (pre/post)
ax.plot(X_plot_pre, Y_plot_pre, color="black", label="Trend Pre-Trade War")
ax.plot(X_plot_post, Y_plot_post, color="gray", label="Trend Post-Trade War")
# Mark the position of the intervention
ax.axvline(pre_month_num + 0.5, color="black", linestyle="--")
ax.text(pre_month_num + 2.5, min(china_exports_its[i]["exports"]), "2018-03", ha="center")
plt.tight_layout()
plt.show()
# Plot ITS to see the result on import partners
fig, axs = plt.subplots(2, 5, figsize=(40,10))
for i, ax in enumerate(fig.axes):
# Retrieve the coefficients of the segmented regression model
beta_0, beta_2, beta_1, beta_3 = china_imports_model[i].params # intercept, intervention, time_feature, postslope
# Generate datapoints for the pre-period
pre = china_imports_its[i][china_imports_its[i]["time"] <= "2018-03"]
pre_month_num = len(pre)
X_plot_pre = np.linspace(1, pre_month_num, 100)
Y_plot_pre = beta_0 + beta_1 * X_plot_pre
# Generate datapoints for the post-period
X_plot_post = np.linspace(pre_month_num+1, len(us_imports_its[i]), 100)
Y_plot_post = beta_0 + beta_1 * X_plot_post + beta_2 * 1 + beta_3 * (X_plot_post-pre_month_num)
# Visualization
ax.scatter(x=china_imports_its[i]["time_feature"], y=china_imports_its[i]["imports"], color='r')
# Set the axis and format
ax.set_title("Bilateral Imports (from "+ china_import_top10[i]+ " to China) Trend", loc="center", fontsize=14, weight="bold")
ax.set_xlabel("Time (Months)")
ax.set_xticks(list(range(0, len(china_imports_its[i]), 6)))
ax.set_ylabel("Total Amount (millions of U.S. dollars)")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p:format(int(x), ',')))
# Plot the two regression lines (pre/post)
ax.plot(X_plot_pre, Y_plot_pre, color="black", label="Trend Pre-Trade War")
ax.plot(X_plot_post, Y_plot_post, color="gray", label="Trend Post-Trade War")
# Mark the position of the intervention
ax.axvline(pre_month_num + 0.5, color="black", linestyle="--")
ax.text(pre_month_num + 2.5, min(china_imports_its[i]["imports"]), "2018-03", ha="center")
plt.tight_layout()
plt.show()
From the import ITS plots, South Korea, Brazil, Japan and Germany have seen their total imports to China decline over time after the trade war. The rest of the world's imports have not changed much. On the other hand, we can see that exports from China to other places are not affected much by the trade war, except for India and Germany, where the export trade amount increased at the time of the incident but then declined. For all of China's export partners, the overall trade amount of exports was greater than before the trade war.
To look deeper, we calculate the trade war impact and create bar plot for comparison.
# More analysis on top10 import/export partners
# We use data from ITS reports and propose a new formula as before
import_impact = []
export_impact = []
for i in range(10):
if china_imports_model[i].params[2]>0:
import_impact.append(china_imports_model[i].params[3]/china_imports_model[i].params[2])
else:
import_impact.append(-china_imports_model[i].params[3]/china_imports_model[i].params[2])
if china_exports_model[i].params[2]>0:
export_impact.append(china_exports_model[i].params[3]/china_exports_model[i].params[2])
else:
export_impact.append(-china_exports_model[i].params[3]/china_exports_model[i].params[2])
# Visualize
data_import = {'CTYNAME': china_import_top10, 'imports': import_impact}
data_export = {'CTYNAME': china_export_top10, 'exports': export_impact}
china_import_impact = pd.DataFrame(data=data_import)
china_export_impact = pd.DataFrame(data=data_export)
china_import_impact = china_import_impact.sort_values(by=['imports'], ascending=False)
china_export_impact = china_export_impact.sort_values(by=['exports'], ascending=False)
sns.set(rc={'figure.figsize':(15,6)})
fig = plt.figure()
ax1 = plt.subplot(121)
sns.barplot(x=china_import_impact.CTYNAME, y=china_import_impact.imports, data=china_import_impact, capsize=.05, palette="Reds_r").set_title('Impact on Top 10 Import Partners', fontsize=14)
ax2 = plt.subplot(122)
sns.barplot(x=china_export_impact.CTYNAME, y=china_export_impact.exports, data=china_export_impact, capsize=.05, palette="Reds_d").set_title('Impact on Top 10 Export Partners', fontsize=14)
ax1.set_xticklabels(china_import_impact.CTYNAME, rotation=45)
ax2.set_xticklabels(china_export_impact.CTYNAME, rotation=45)
ax1.xaxis.label.set_visible(False)
ax2.xaxis.label.set_visible(False)
ax1.yaxis.label.set_visible(False)
ax2.yaxis.label.set_visible(False)
fig.text(0.5, 0, 'Business Partners', ha='center', fontsize=14)
fig.text(-0.01, 0.5, 'Trade War Impact', va='center', rotation='vertical', fontsize=14)
fig.tight_layout()
plt.show()
#fig.savefig('china_impact.png')
From the above plot, we can easily view the trade war impact on different business partners. The trade war has more negative impact on China import partners: South Korea and little positive/negative impact on other trade partners. Most of the import trade partners have negative impacts. On the other hand, as being China export partners, United Kingdom and Singapore gains more influence from the trade war. The United Kingdom has more than 15 positive impact and Singapore has more than 5 positive impact. For the rest of the export partners, they receive relatively small negative impact.
From the above analysis, trade war between US and China seems to not have a significant import/export trade amount impact on most of their primary business partners. Being as the main two trade partners in the world, it is less likely for other business partners to reduce their trade with the US and China.
In order to better understand the trade relationship between other countries and the US and China, we have drawn a world map here, representing from 1996 to 2018, whether each country has more total trade amount with the US or China.
# create annual imports+exports dataframe for US and China respectively
us_world_trade = us_ie.drop('CTY_CODE', 1)
us_world_trade = pd.concat([us_world_trade.iloc[:, :2], us_world_trade.iloc[:, 14:15], us_world_trade.iloc[:, 27:28]], axis=1)
us_world_trade['SUM_us'] = us_world_trade.IYR + us_world_trade.EYR
china_index = us_world_trade[(us_world_trade.CTYNAME == 'China')].index
us_world_trade = us_world_trade.drop(china_index)
us_world_trade = us_world_trade.drop('IYR',1)
us_world_trade = us_world_trade.drop('EYR',1)
us_world_trade = us_world_trade.loc[us_world_trade['year'] >= 1996]
us_world_trade = us_world_trade.loc[us_world_trade['year'] <= 2018]
us_world_trade.head()
# In order to observe longer time, we use a new China partner dataset from WTO(https://data.wto.org/)
CHINA_IMPORT_PATH = "data/china_import.csv"
china_longer_import = pd.read_csv(CHINA_IMPORT_PATH, encoding="latin-1")
china_longer_import = china_longer_import.filter(['Partner Economy', 'Year', 'Value'], axis=1)
china_longer_import = china_longer_import.sort_values(by=['Year'])
frame = []
for i in range(23):
tmp_yr = china_longer_import[china_longer_import['Year'] == 1996+i]
tmp_yr = tmp_yr.groupby('Partner Economy')['Value'].sum()
tmp_yr = tmp_yr.to_frame()
tmp_yr['Year'] = [1996+i for x in range(len(tmp_yr)) ]
frame.append(tmp_yr)
china_new_import = pd.concat(frame)
CHINA_EXPORT_PATH = "data/china_export.csv"
china_longer_export = pd.read_csv(CHINA_EXPORT_PATH, encoding="latin-1")
china_longer_export = china_longer_export.filter(['Reporting Economy', 'Year', 'Value'], axis=1)
china_longer_export = china_longer_export.sort_values(by=['Year'])
frame = []
for i in range(23):
tmp_yr = china_longer_export[china_longer_export['Year'] == 1996+i]
tmp_yr = tmp_yr.groupby('Reporting Economy')['Value'].sum()
tmp_yr = tmp_yr.to_frame()
tmp_yr['Year'] = [1996+i for x in range(len(tmp_yr)) ]
frame.append(tmp_yr)
china_new_export = pd.concat(frame)
china_new_import.rename({ 'Year': 'year', 'Value':'Import'}, axis=1, inplace=True)
china_new_export.rename({ 'Year': 'year', 'Value':'Export'}, axis=1, inplace=True)
china_new_import = china_new_import.reset_index()
china_new_import.rename({'Partner Economy': 'CTYNAME'}, axis=1, inplace=True)
china_new_export = china_new_export.reset_index()
china_new_export.rename({'Reporting Economy': 'CTYNAME'}, axis=1, inplace=True)
china_new_ie = pd.merge(china_new_import, china_new_export, how='left', on=['CTYNAME', 'year'])
china_new_ie['Import'] = china_new_ie['Import'].fillna(0)
china_new_ie['Export'] = china_new_ie['Export'].fillna(0)
china_new_ie['SUM_china'] = china_new_ie.Import + china_new_ie.Export
china_new_ie = china_new_ie.drop('Import',1)
china_world_trade = china_new_ie.drop('Export',1)
# change unit to millions of US dollars
china_world_trade['SUM_china'] = china_world_trade['SUM_china']/1000000
# rearrange column order
china_world_trade = china_world_trade[['year', 'CTYNAME','SUM_china']]
china_world_trade.head()
# conbime US and China trade partner
world_trade = pd.merge(us_world_trade, china_world_trade, how='left', on=['year', 'CTYNAME'])
world_trade['SUM_us'] = world_trade['SUM_us'].fillna(0)
world_trade['SUM_china'] = world_trade['SUM_china'].fillna(0)
world_trade['DIFF'] = world_trade.SUM_us - world_trade.SUM_china
world_trade = world_trade.drop('SUM_us',1)
world_trade = world_trade.drop('SUM_china',1)
# drop US/China itself
china_index = world_trade[(world_trade.CTYNAME == 'China')].index
us_index = world_trade[(world_trade.CTYNAME == 'United States of America')].index
world_trade = world_trade.drop(china_index)
world_trade = world_trade.drop(us_index)
world_trade.head()
# Construct our world map
# Function to convert to alpah3 country codes and continents
def get_continent(col):
try:
cn_a3_code = country_name_to_country_alpha3(col)
except:
cn_a3_code = 'Unknown'
return cn_a3_code
country_code = []
for i in world_trade['CTYNAME']:
country_code.append(get_continent(i))
world_trade['code'] = country_code
world_trade = world_trade.loc[world_trade['code'] != 'Unknown']
layout = dict(layout=dict(geo=layout.Geo(showcountries=True, showlakes=False, showland=True, landcolor='#f0f0f0')))
fig = px.choropleth(world_trade, locations="code", color="DIFF", hover_name="CTYNAME", animation_frame="year",
color_continuous_scale=px.colors.diverging.RdBu, title='<b>US/China Business Partners</b> from 1996 to 2018',
range_color=[-100000,100000],height=600, template=layout)
fig.update_layout(
title={'x':0.05, 'xanchor': 'left'})
fig.show()
#plotly.offline.plot(fig, filename='world_map.html', image_width=100, image_height=100)
We can see that from the beginning, the toal trade amount with U.S. was greater than China's in almost every region. However, China rise rapidly during the next few years and become one of the top economies in the world today.
# read the dataframe which represents US exports and imports trades with other countries from year 2015 to 2020
df = pd.read_excel("./data/trades_by_goods.xlsx")
# extract exports and imports trades with Chine
df_china = df[df["Country"]=="China"]
# get rid of unnecessary columns and missing data(in this dataframe values are all 0 for the year 2020)
df_china.drop(['Country', 'CTY_CODE'], axis=1, inplace=True)
df_china = df_china[df_china['Year']!=2020]
#df_china
# rearranging the dataframe to facilitate our analysis work afterwards
df_exports = pd.melt(df_china, id_vars=['Year', 'SITC'], value_vars=['ExportsFASValueBasisJan', 'ExportsFASValueBasisFeb',\
'ExportsFASValueBasisMar', 'ExportsFASValueBasisApr',\
'ExportsFASValueBasisMay', 'ExportsFASValueBasisJun',\
'ExportsFASValueBasisJul', 'ExportsFASValueBasisAug',\
'ExportsFASValueBasisSep', 'ExportsFASValueBasisOct',\
'ExportsFASValueBasisNov', 'ExportsFASValueBasisDec'])
df_imports = pd.melt(df_china, id_vars=['Year', 'SITC'], value_vars=['GenImportsCustomsValBasisJan', 'GenImportsCustomsValBasisFeb',\
'GenImportsCustomsValBasisMar', 'GenImportsCustomsValBasisApr',\
'GenImportsCustomsValBasisMay', 'GenImportsCustomsValBasisJun',\
'GenImportsCustomsValBasisJul', 'GenImportsCustomsValBasisAug',\
'GenImportsCustomsValBasisSep', 'GenImportsCustomsValBasisOct',\
'GenImportsCustomsValBasisNov', 'GenImportsCustomsValBasisDec'])
# create a column representing the time
month_mapping = {'Jan':1, 'Feb':2, 'Mar':3, 'Apr':4, 'May':5, 'Jun':6, 'Jul':7, 'Aug':8, 'Sep':9, 'Oct':10,\
'Nov':11,'Dec':12}
df_exports['variable'] = df_exports['variable'].apply(lambda x: month_mapping[x[-3:]])
df_imports['variable'] = df_imports['variable'].apply(lambda x: month_mapping[x[-3:]])
time_exports = pd.DataFrame({'year': list(df_exports['Year']),
'month': list(df_exports['variable']),
'day': [1 for i in range(len(df_exports))]})
time_imports = pd.DataFrame({'year': list(df_imports['Year']),
'month': list(df_imports['variable']),
'day': [1 for i in range(len(df_imports))]})
df_exports['time'] = pd.to_datetime(time_exports)
df_imports['time'] = pd.to_datetime(time_imports)
# take a look at what df_exports and df_imports look like
df_exports.head(5)
# define a look-up from the SITC number to its category name
sectors = list(df_china['sitc_sdesc'])
# Take a general look at the exports of different industries over the years
pyo.init_notebook_mode()
data = []
for i in range (10):
# create 10 traces which represent 10 different industries
df = df_exports[df_exports['SITC']==i]
trace = {'x': df['time'], 'y': df['value'], 'name': sectors[i], 'type': 'bar'}
data.append(trace)
layout = {'xaxis': {'title': 'Time'}, 'barmode': 'relative', 'title': 'US exports to China for different industries'}
# Plot the figure
fig = go.Figure(data=data, layout=layout)
pyo.iplot(fig)
# imports
data_im = []
for i in range (10):
# create 10 traces which represent 10 different industries
df = df_imports[df_imports['SITC']==i]
trace = {'x': df['time'], 'y': df['value'], 'name': sectors[i], 'type': 'bar'}
data_im.append(trace)
layout = {'xaxis': {'title': 'Time'}, 'barmode': 'relative', 'title': 'US imports from China for different industries'}
# Plot the figure
fig = go.Figure(data=data_im, layout=layout)
pyo.iplot(fig)
We have observed that:
Trend: We could see approximately that the overall trend(sum over all catogories) is consistent with the result we got from before.
Proportion of each industry: For both imports and exports parts, among all the different sectors, the category Machinery and transport equipment has contributed the largest part.
the goods of kind Crude Materials, Inedible, Exept Fuels is at the second place. Especially, it has an obvious seasonal pattern of approximately 1 year and the peak always appears in October. However, we could see that the peak does not happen again in October 2019, after the trade war begins.
The goods Chemicals and related products , Miscellaneous and manufactured articles and MINERAL FUELS, LUBRICANTS AND RELATED MATERIALS follow, etc...
The second most goods are MISCELLANEOUS MANUFACTURED ARTICLES instead. The goods Miscellaneous and manufactured articles and MANUFACTURED GOODS CLASSIFIED CHIEFLY BY MATERIAL follow...
And we could also observe a 1-year seasonal pattern whose peak is also always in around October for the goods Machinery and transport equipment
October is a unique month. In the west, October is a transitional month as autumn slides relentlessly towards winter. The October effect refers to the psychological anticipation that financial declines and stock market crashes are more likely to occur during this month than any other month. And this may explain the seasonality: they almost have an increasing trend until October, after that they begin to decrease.
We will have a more clear and detailed visualization for each kind of goods in the following ITS analysis.
impact_imports = []
impact_exports = []
In order to reduce the noise as most as possible, we remove the seasonal pattern, remaining only the trend and observe if the linear regression performs better.
# To observe the difference of R-squared before and after removing the seasonality
rs_difference_exports = []
rs_difference_imports = []
for i in range(10):
# Removing the seasonality
df = df_exports[df_exports['SITC']==i].sort_values(by='time')
dates = pd.DatetimeIndex([d for d in df['time']])
df.set_index(dates, inplace=True)
result_exports = seasonal_decompose(df['value'], model='additive', extrapolate_trend='freq')
# Add the trend to dataframe
df["exports_trend"] = result_exports.trend + result_exports.resid
df = add_its_features(df, "time", "2018-03")
model_naive = smf.ols(formula='value ~ time_feature + C(intervention) + postslope', data=df)
model = smf.ols(formula='exports_trend ~ time_feature + C(intervention) + postslope', data=df)
res_naive = model_naive.fit()
res = model.fit()
rs_difference_exports.append(res.rsquared - res_naive.rsquared)
impact_exports.append(res.params[3]/res.params[2])
plot_its_result(df, res, "time", "value", "2018-03", sectors[i])
for i in range(10):
# Removing the seasonality
df = df_imports[df_imports['SITC']==i].sort_values(by='time')
dates = pd.DatetimeIndex([d for d in df['time']])
df.set_index(dates, inplace=True)
result_imports = seasonal_decompose(df['value'], model='additive', extrapolate_trend='freq')
# Add the trend to dataframe
df["imports_trend"] = result_imports.trend + result_imports.resid
df = add_its_features(df, "time", "2018-03")
model_naive = smf.ols(formula='value ~ time_feature + C(intervention) + postslope', data=df)
model = smf.ols(formula='imports_trend ~ time_feature + C(intervention) + postslope', data=df)
res = model.fit()
res_naive = model_naive.fit()
rs_difference_imports.append(res.rsquared - res_naive.rsquared)
impact_imports.append(res.params[3]/res.params[2])
plot_its_result(df, res, "time", "value", "2018-03", sectors[i])
print(rs_difference_exports)
print(rs_difference_imports)
Comparing the R2 before and after removing the seasonality, for most of them, the regression analysis better explains the data after removing the seasonality.
df_impact_exports = pd.DataFrame(data={'commodity': sectors[:10], 'impact': impact_exports})
df_impact_exports.sort_values(by='impact', inplace=True)
df_impact_imports = pd.DataFrame(data={'commodity': sectors[:10], 'impact': impact_imports})
df_impact_imports.sort_values(by='impact', inplace=True)
# We take a look at the quantified impact that the trade war have on different industries
sns.set(rc={'figure.figsize':(15,10)})
ax1 = plt.subplot(121)
sns.barplot(df_impact_exports['impact'], df_impact_exports['commodity']).set_title('Impact on Commodity Wise Export')
plt.show()
ax2 = plt.subplot(122)
sns.barplot(df_impact_imports['impact'], df_impact_imports['commodity']).set_title('Impact on Commodity Wise Import')
plt.show()
We have observed that in the first plot: the impact on the U.S exports is all negative except beverages and tobacco. However, this calculation for this one is not that reliable since in the ITS analysis plot for beverages and tobacco individually before, we have observed some clear peaks whose value is far more than the values of other months. Moving out of those outliers and observing the remaining trend, the following is the comparison we got finally.
# drop the values at month 3, 14, 27, 39
df1 = df_exports[df_exports['SITC']==1].sort_values(by='time')
dates = pd.DatetimeIndex([d for d in df1['time']])
df1.set_index(dates, inplace=True)
df1 = add_its_features(df1, "time", "2018-03")
df1 = df1[df1['time_feature'].apply(lambda x: x not in [3, 14, 27, 39])]
# ITS fit after removing those values
model = smf.ols(formula='value ~ time_feature + C(intervention) + postslope', data=df1)
res = model.fit()
impact_excluded_outliers = res.params[3]/res.params[2]
# update the new impact of BEVERAGES AND TOBACCO
df_impact_exports.loc[(df_impact_exports.commodity == 'BEVERAGES AND TOBACCO'),'impact'] = impact_excluded_outliers
# Visualization of impact
fig = go.Figure(data=[
go.Bar(name='Exports', x=list(df_impact_exports['commodity']), y=list(df_impact_exports['impact'])),
go.Bar(name='Imports', x=list(df_impact_imports['commodity']), y=list(df_impact_imports['impact']))
])
#fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()
Click on Exports and Imports to see individual visualization.
The impact is almost mutually negative for different kinds of industries for either imports or exports.
Comparing the impact over different industries, the most negative impact is on animals and vegetable oils, fats and waxes among both exports and imports, and for imports especially. The category beverages and tobacco has the least impact, for both exports and imports.
Comparing impact on exports and imports, the imports of the U.S is influenced more by the trade war than the exports.